Chunking Clinical Text Containing Non-Canonical Language
نویسندگان
چکیده
Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using offthe-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to partof-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy.
منابع مشابه
Flexible Text Segmentation with Structured Multilabel Classification
Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguo...
متن کاملShallow Parsing and Text Chunking: a View on Underspecification in Syntax
This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked sy...
متن کاملAn Affinity Based Greedy Approach towards Chunking for Indian Languages
A robust chunker can drastically reduce the complexity of parsing of natural language text. Chunking for Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. A computational framework for chunking based on valency theory and feature structures has been described here. The paper also draws an analogy of chunk formation in free word ...
متن کاملExamining reading fluency in a foreign language: Effects of text segmentation on L2 readers
Grouping words into meaningful chunks is a fundamental process for fluent reading. The present study is an attempt to understand the relationship between chunking and second language (L2) reading fluency. The effects of text segmentation on comprehension, rate, and regression in L2 reading were investigated using a self-paced reading task in a moving-window condition. The participants were inte...
متن کاملImproving Biomedical Text Categorisation with NLP
Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014